Goto

Collaborating Authors

 generalizable neural surface reconstruction



ReTR: Modeling Rendering Via Transformer for Generalizable Neural Surface Reconstruction

Neural Information Processing Systems

Generalizable neural surface reconstruction techniques have attracted great attention in recent years. However, they encounter limitations of low confidence depth distribution and inaccurate surface reasoning due to the oversimplified volume rendering process employed. In this paper, we present Reconstruction TRansformer (ReTR), a novel framework that leverages the transformer architecture to redesign the rendering process, enabling complex render interaction modeling. It introduces a learnable $\textit{meta-ray token}$ and utilizes the cross-attention mechanism to simulate the interaction of rendering process with sampled points and render the observed color. Meanwhile, by operating within a high-dimensional feature space rather than the color space, ReTR mitigates sensitivity to projected colors in source views. Such improvements result in accurate surface assessment with high confidence. We demonstrate the effectiveness of our approach on various datasets, showcasing how our method outperforms the current state-of-the-art approaches in terms of reconstruction quality and generalization ability.



GenS: Generalizable Neural Surface Reconstruction from Multi-View Images (Supplemental material) A Implementation details of the network

Neural Information Processing Systems

The detailed network architecture is shown in Tab. 1. "Out" As shown in Tab. 2, we inject cost Here, we show more ablation studies in dense setting. C.1 Generalized multi-scale volume GMV MFS VCL Mean 1.92 1.08 0.83 0.81 The "Base" in Tab. 5 is a model with only the generalized The "PC" stands for the model applying The results show that it cannot work well for generalization training. Based on this intuition, we attempt to increase the receptive field of image patches, that is, we downsample the image in the early stage, and then sample the image patch for multi-view matching. We call this strategy multi-scale photometric consistency (MPC). Tab. 5 show that enlarging the receptive field works well for our generalization training and brings FPN feature network to achieve our multi-scale feature-metric consistency, which simultaneously have different ranges of receptive fields.


ReTR: Modeling Rendering Via Transformer for Generalizable Neural Surface Reconstruction

Neural Information Processing Systems

Generalizable neural surface reconstruction techniques have attracted great attention in recent years. However, they encounter limitations of low confidence depth distribution and inaccurate surface reasoning due to the oversimplified volume rendering process employed. In this paper, we present Reconstruction TRansformer (ReTR), a novel framework that leverages the transformer architecture to redesign the rendering process, enabling complex render interaction modeling. It introduces a learnable \textit{meta-ray token} and utilizes the cross-attention mechanism to simulate the interaction of rendering process with sampled points and render the observed color. Meanwhile, by operating within a high-dimensional feature space rather than the color space, ReTR mitigates sensitivity to projected colors in source views.

  generalizable neural surface reconstruction, retr, transformer, (1 more...)

GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

Neural Information Processing Systems

Combining the signed distance function (SDF) and differentiable volume rendering has emerged as a powerful paradigm for surface reconstruction from multi-view images without 3D supervision. However, current methods are impeded by requiring long-time per-scene optimizations and cannot generalize to new scenes. Unlike coordinate-based methods that train a separate network for each scene, we construct a generalized multi-scale volume to directly encode all scenes. Compared with existing solutions, our representation is more powerful, which can recover high-frequency details while maintaining global smoothness. Meanwhile, we introduce a multi-scale feature-metric consistency to impose the multi-view consistency in a more discriminative multi-scale feature space, which is robust to the failures of the photometric consistency. And the learnable feature can be self-enhanced to continuously improve the matching accuracy and mitigate aggregation ambiguity.